Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale service destination based on available memory #11739

Merged

Conversation

lahsivjar
Copy link
Contributor

Closes #11721

@lahsivjar lahsivjar requested a review from a team as a code owner September 28, 2023 15:53
s.config.Aggregation.ServiceTransactions.MaxGroups, memLimitGB,
if s.config.Aggregation.ServiceDestinations.MaxGroups <= 0 {
// scale based on available memory considering 5K groups for 1GB
s.config.Aggregation.ServiceDestinations.MaxGroups = linearScaledValue(5_000, memLimitGB)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[For reviewers] Not sure what is a good value here, previous default was 10K for all, now we will have much greater values. However, service destination is not very costly so I kept it 5k/GB. Let me know if others have any concerns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: if it was a constant 10k for all, do we consider this a breaking change for 1GB users? do we accept the risk?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without indicators that current limits were too high, I'd keep the 10k for 1GB, and then afterwards start 5k steps per GB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

service destination doesn't use histograms and only uses 2 float64, which makes it very light on memory. +1 on keeping 10k for 1GB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting with 10K for 1GB sounds good to me. One potential drawback can be that if we were to introduce summary or histograms in the future in the service destination metrics then we might have to introduce a breaking change to reduce the limits but probably better to do a breaking change then than now.

afterwards start 5k steps per GB.

Should we make it 10K per GB instead or will that be too high for bigger APM servers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make it 10K per GB instead or will that be too high for bigger APM servers?

Wouldn't be a problem now as there are no histograms. But as you've said, it we were to introduce histograms, it will be 2x the other limits and require us to make a breaking change to reduce the limits, but we will then have a pressing reason to do so. So I'm fine with 10k per GB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd start with 10K for 1GB and then 5K for additional GB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated code to use 10k for 1GB and then 5k per GB for svc destination limit

@mergify
Copy link
Contributor

mergify bot commented Sep 28, 2023

This pull request does not have a backport label. Could you fix it @lahsivjar? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-7.17 is the label to automatically backport to the 7.17 branch.
  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Sep 28, 2023
Copy link
Member

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • this change may be breaking for 1GB
  • please also update docs/data-model.asciidoc

@lahsivjar lahsivjar requested review from simitt and carsonip October 2, 2023 09:15
carsonip
carsonip previously approved these changes Oct 2, 2023
internal/beater/beater.go Outdated Show resolved Hide resolved
internal/beater/beater.go Outdated Show resolved Hide resolved
@lahsivjar lahsivjar merged commit 60f6ac5 into elastic:main Oct 2, 2023
@lahsivjar lahsivjar deleted the fix-span-destination-scaling-11721 branch October 3, 2023 01:56
@carsonip carsonip self-assigned this Oct 19, 2023
@carsonip
Copy link
Member

carsonip commented Oct 19, 2023

Testing notes:

❌ test-plan-regression

  • No "Aggregation.ServiceDestinations.MaxGroups set to %d based on %0.1fgb of memory" found in logs. Probably due to mixing a default & s.config.Aggregation.ServiceDestinations.MaxGroups <= 0 check.

carsonip added a commit to carsonip/apm-server that referenced this pull request Oct 19, 2023
Fix regression introduced in elastic#11739

- Fix bug where code is never executed
- Fix wrong log message
simitt pushed a commit that referenced this pull request Oct 20, 2023
* Fix service destination max group scaling based on memory

Fix regression introduced in #11739

- Fix bug where code is never executed
- Fix wrong log message
- Fix failing test
mergify bot pushed a commit that referenced this pull request Oct 20, 2023
* Fix service destination max group scaling based on memory

Fix regression introduced in #11739

- Fix bug where code is never executed
- Fix wrong log message
- Fix failing test

(cherry picked from commit 791f582)
mergify bot added a commit that referenced this pull request Oct 23, 2023
…11906)

* Fix service destination max group scaling based on memory

Fix regression introduced in #11739

- Fix bug where code is never executed
- Fix wrong log message
- Fix failing test

(cherry picked from commit 791f582)

Co-authored-by: Carson Ip <[email protected]>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
@lahsivjar
Copy link
Contributor Author

Tested via the regression fix PR: #11905 (comment). Marking as test-plan-ok now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify test-plan test-plan-ok v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scale max service destination and max per service service destination aggregations based on available memory
3 participants